The dataset is about red wine quality, containing 1599 observations (wine) of
12 variables (chemical properties of wine).The variable ‘quality’ (based on
sensory data) score between 0 (very bad) and 10 (very excellent).
## Length Class Mode
## 0 NULL NULL
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 5.000 6.000 5.636 6.000 8.000
## 'data.frame': 1599 obs. of 14 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ fixed.acidity : num 7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
## $ volatile.acidity : num 0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
## $ citric.acid : num 0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
## $ residual.sugar : num 1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
## $ chlorides : num 0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
## $ free.sulfur.dioxide : num 11 25 15 17 11 13 15 15 9 17 ...
## $ total.sulfur.dioxide: num 34 67 54 60 34 40 59 21 18 102 ...
## $ density : num 0.998 0.997 0.997 0.998 0.998 ...
## $ pH : num 3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
## $ sulphates : num 0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
## $ alcohol : num 9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
## $ quality : int 5 5 5 6 5 5 5 7 7 5 ...
## $ quality_f : Factor w/ 6 levels "3","4","5","6",..: 3 3 3 4 3 3 3 5 5 3 ...
## [1] "X" "fixed.acidity" "volatile.acidity"
## [4] "citric.acid" "residual.sugar" "chlorides"
## [7] "free.sulfur.dioxide" "total.sulfur.dioxide" "density"
## [10] "pH" "sulphates" "alcohol"
## [13] "quality" "quality_f"
## X fixed.acidity volatile.acidity citric.acid
## Min. : 1.0 Min. : 4.60 Min. :0.1200 Min. :0.000
## 1st Qu.: 400.5 1st Qu.: 7.10 1st Qu.:0.3900 1st Qu.:0.090
## Median : 800.0 Median : 7.90 Median :0.5200 Median :0.260
## Mean : 800.0 Mean : 8.32 Mean :0.5278 Mean :0.271
## 3rd Qu.:1199.5 3rd Qu.: 9.20 3rd Qu.:0.6400 3rd Qu.:0.420
## Max. :1599.0 Max. :15.90 Max. :1.5800 Max. :1.000
## residual.sugar chlorides free.sulfur.dioxide
## Min. : 0.900 Min. :0.01200 Min. : 1.00
## 1st Qu.: 1.900 1st Qu.:0.07000 1st Qu.: 7.00
## Median : 2.200 Median :0.07900 Median :14.00
## Mean : 2.539 Mean :0.08747 Mean :15.87
## 3rd Qu.: 2.600 3rd Qu.:0.09000 3rd Qu.:21.00
## Max. :15.500 Max. :0.61100 Max. :72.00
## total.sulfur.dioxide density pH sulphates
## Min. : 6.00 Min. :0.9901 Min. :2.740 Min. :0.3300
## 1st Qu.: 22.00 1st Qu.:0.9956 1st Qu.:3.210 1st Qu.:0.5500
## Median : 38.00 Median :0.9968 Median :3.310 Median :0.6200
## Mean : 46.47 Mean :0.9967 Mean :3.311 Mean :0.6581
## 3rd Qu.: 62.00 3rd Qu.:0.9978 3rd Qu.:3.400 3rd Qu.:0.7300
## Max. :289.00 Max. :1.0037 Max. :4.010 Max. :2.0000
## alcohol quality quality_f
## Min. : 8.40 Min. :3.000 3: 10
## 1st Qu.: 9.50 1st Qu.:5.000 4: 53
## Median :10.20 Median :6.000 5:681
## Mean :10.42 Mean :5.636 6:638
## 3rd Qu.:11.10 3rd Qu.:6.000 7:199
## Max. :14.90 Max. :8.000 8: 18
##
## 3 4 5 6 7 8
## 10 53 681 638 199 18
All variables are numeric type except for quality, which is integer. I will
create a variable named ‘quality_f’ as factor.
In the dataset, ‘quality’ variable score between 3 - 8. Above results shows
the distribution of red wine of each quality score in the dataset. We can see
that most red wine’s quality score between 5 and 6.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.40 9.50 10.20 10.42 11.10 14.90
There’s a peak around 9.2 - 9.8 in distribution of ‘alcohol’ variable. Also, I
noticed few wine has exremely high alcohol (above 14, and between 14.5 and
15.0) and extremely low alcohol (below 9). Let’s look at these outliners in
the alcohol.
## wine$quality: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.400 9.725 9.925 9.955 10.580 11.000
## --------------------------------------------------------
## wine$quality: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.00 9.60 10.00 10.27 11.00 13.10
## --------------------------------------------------------
## wine$quality: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.5 9.4 9.7 9.9 10.2 14.9
## --------------------------------------------------------
## wine$quality: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.40 9.80 10.50 10.63 11.30 14.00
## --------------------------------------------------------
## wine$quality: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.20 10.80 11.50 11.47 12.10 14.00
## --------------------------------------------------------
## wine$quality: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.80 11.32 12.15 12.09 12.88 14.00
It looks like alcohol outliners which have extremely low alcohol (below 9)
tend to be in low quality category 3, 4, 5 and 6, while alcohol outliners
which have extremely high alcohol (above 14) tend to be in high quality
category 5,6,7 and 8.
## # A tibble: 6 x 4
## quality alco_mean alco_median n
## <int> <dbl> <dbl> <int>
## 1 3 9.955000 9.925 10
## 2 4 10.265094 10.000 53
## 3 5 9.899706 9.700 681
## 4 6 10.629519 10.500 638
## 5 7 11.465913 11.500 199
## 6 8 12.094444 12.150 18
I grouped a subset table ‘wine.alco_by_quality’, describing alcohol
categorized in quality. I noticed that the best quality category has the
biggest mean 12.09 and median of alcohol 12.88.
## wine$quality: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.400 9.725 9.925 9.955 10.580 11.000
## --------------------------------------------------------
## wine$quality: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.00 9.60 10.00 10.27 11.00 13.10
## --------------------------------------------------------
## wine$quality: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.5 9.4 9.7 9.9 10.2 14.9
## --------------------------------------------------------
## wine$quality: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.40 9.80 10.50 10.63 11.30 14.00
## --------------------------------------------------------
## wine$quality: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.20 10.80 11.50 11.47 12.10 14.00
## --------------------------------------------------------
## wine$quality: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.80 11.32 12.15 12.09 12.88 14.00
I looked at the mean and median of alcohol in each quality category, and I’m
curious to find out if alcohol influence the quality of wine. And if there’s
other variables together with alcohol influence the quality.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.090 0.260 0.271 0.420 1.000
Citric.acid is slightly skewed to the right. There’s a high peak in
‘citric.acid’ variable’s distribution at 0.00. It’s normal becasue citric.acid
often found small quantities in wine.
There’s another 3 relatively small peaks in the distribution. I also noticed
an outliner, which is at 1.00. Because citric.acid can add ‘freshness’ and
flavor to wines, I’m wondering if higher citric.acid positvely influence quality
of wines. And if the wine which have citric.acid equal to 1 are in better
quality.
## X fixed.acidity volatile.acidity citric.acid residual.sugar
## 152 152 9.2 0.52 1 3.4
## chlorides free.sulfur.dioxide total.sulfur.dioxide density pH
## 152 0.61 32 69 0.9996 2.74
## sulphates alcohol quality quality_f
## 152 2 9.4 4 4
While it surprised me that the wine having maximum citric.acid is in quality 4,
which is not counted for a better quality.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.60 7.10 7.90 8.32 9.20 15.90
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1200 0.3900 0.5200 0.5278 0.6400 1.5800
Above two histograms show distribution of ‘fixed.acidity’ variable (do not
evaporate readily) and ‘volatile.acidity’ variable (represent the amount of
acetic acid in wine, which at too high of levels can lead to an unpleasant,
vinegar taste).
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.900 1.900 2.200 2.539 2.600 15.500
The residual.sugar is skewed to the right, with some outliners above 11. Most
of residual.sugar is between 1 and 3.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.01200 0.07000 0.07900 0.08747 0.09000 0.61100
Most of the chlorides are between 0.05 and 0.12.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9901 0.9956 0.9968 0.9967 0.9978 1.0040
Around 75% wine have density 0.9978. The median density is 0.9968, and the mean
density is 0.9967, which these two are pretty close.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.740 3.210 3.310 3.311 3.400 4.010
pH is fairly normally distributed with a few outliners. The mean pH is 3.311,
and around 75% of pH is 3.4.
## quality_bucket
## Low (Rating 3 - 4) Medium (Rating 5 - 6) High (Rating 7 - 8)
## 63 1319 217
I created quality_bucket to group quality ratings. Wines receiving 3 and 4 quality
score grouped in “Low” quality_bucket, wines receiving 5 and 6 quality score
grouped in “Medium” quality_bucket, and wines receiving 7 and 8 quality score
grouped in “High” quality_bucket.
There are 1599 wine observations in the dataset with 12 features
(fixed acidity, volatile acidity, citric acid, residual sugar, chlorides,
free sulfur dioxide, total sulfur dioxide, density, pH, sulphates, alcohol and
quality). The output variable quality is based on sensor data, scoring between
0 and 10.
I set the ‘quality’ variable as ordered factor variable. Its levels are showed
as below:
(very bad) —–> (very excellent)
quality: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
While in the dataset, quality variable ranges between 3 and 8.
Other observations:
The main features of interest in my dataset are quality, alcohol and
citric.acid. I’d like to know which feature or features combination are best
for predicting the quality of wine.
I suspect alcohol or citric.acid and some combination of the other variables
can influence the quality of wine. This suspection may help me build a
predictive model for wine quality in the following analysis.
Features like density and pH will help support my investigation
because I suspect alcohol might influence the density of water in wine, and pH
might be influenced by alcohol and citric.acid.
I created ‘quality_f’ variable as factor for further bivariate analysis, and
a quality bucket grouping qualities into ‘low’ ,‘medium’ and ‘high’. Also
I created a subset named ‘wine.alco_by_quality’ to better see if there’s
correlation between these two variables.
I found some outliners in alcohol variable (below 9 or above 14). Also, I noticed that the best quality category has the biggest mean 12.09 and median of
alcohol 12.88. But it doesn’t mean any liner or correlation between alcohol and
quality. I will further analysize them in the following section.
Citric.acid distribution has several peaks and is slightly skewed to the
right. The highest peak is at 0.00, and there’s another 3 relatively small
peaks in the distribution. I also noticed an outliner, which is at 1.00. I
checked the wine with 1.00 citric.acid and found it is in quality 4.
From this matrix, I noticed that among my featured interested variables (alcohol,
quality, pH, density and citric.acid), there’s some meaningful correlations I would
like to take a look, such as correlation of quality and alcohol, alcohol and pH,
citric.acid and density, citric.acid and pH, citric.acid and quality. Becasue
these correlation value seem to be bigger than 0.3 or smaller than -0.3, which
means may have a meaningful correlation.
I removed outliners in alcohol to see if the relationship between alcohol and
quality would be stronger. It turned out just a little bit stronger. So It’s
better to use Pearson’s correlation to test these two. And maybe there’s more
variables participate into this relationship.
##
## Pearson's product-moment correlation
##
## data: wine$alcohol and wine$quality
## t = 21.639, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4373540 0.5132081
## sample estimates:
## cor
## 0.4761663
Above Pearson’s correlation result shows there’s a moderate correlation between
alcohol and quality. To be more specific, wine with higher alcohol tend to be
in better quality. ### Relationship between citric.acid and density
ggplot(aes(x=citric.acid,y=density),data=wine)+
geom_point(alpha=0.3,size=1)+
geom_smooth(method=lm, se=FALSE, size=0.6)
cor.test(wine$citric.acid,wine$density)
##
## Pearson's product-moment correlation
##
## data: wine$citric.acid and wine$density
## t = 15.665, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3216809 0.4066925
## sample estimates:
## cor
## 0.3649472
There’s a positive meaningful but small correlation between citric.acid and density.
##
## Pearson's product-moment correlation
##
## data: wine$alcohol and wine$pH
## t = 8.397, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1582061 0.2521123
## sample estimates:
## cor
## 0.2056325
Alcohol and pH have few correlation.
##
## Pearson's product-moment correlation
##
## data: wine$alcohol and wine$density
## t = -22.838, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.5322547 -0.4583061
## sample estimates:
## cor
## -0.4961798
There’s a moderate correlation between alcohol and density variables. To be
specific, wine with higher alcohol tend to have lower density.
##
## Pearson's product-moment correlation
##
## data: wine$citric.acid and wine$quality
## t = 9.2875, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1793415 0.2723711
## sample estimates:
## cor
## 0.2263725
## wine$quality: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0050 0.0350 0.1710 0.3275 0.6600
## --------------------------------------------------------
## wine$quality: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0300 0.0900 0.1742 0.2700 1.0000
## --------------------------------------------------------
## wine$quality: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0900 0.2300 0.2437 0.3600 0.7900
## --------------------------------------------------------
## wine$quality: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0900 0.2600 0.2738 0.4300 0.7800
## --------------------------------------------------------
## wine$quality: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.3050 0.4000 0.3752 0.4900 0.7600
## --------------------------------------------------------
## wine$quality: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0300 0.3025 0.4200 0.3911 0.5300 0.7200
## wine$quality: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0050 0.0350 0.1710 0.3275 0.6600
## --------------------------------------------------------
## wine$quality: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0300 0.0900 0.1742 0.2700 1.0000
## --------------------------------------------------------
## wine$quality: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0900 0.2300 0.2437 0.3600 0.7900
## --------------------------------------------------------
## wine$quality: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0900 0.2600 0.2738 0.4300 0.7800
## --------------------------------------------------------
## wine$quality: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.3050 0.4000 0.3752 0.4900 0.7600
## --------------------------------------------------------
## wine$quality: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0300 0.3025 0.4200 0.3911 0.5300 0.7200
## # A tibble: 6 x 3
## quality citric_mean `n()`
## <int> <dbl> <int>
## 1 3 0.1710000 10
## 2 4 0.1741509 53
## 3 5 0.2436858 681
## 4 6 0.2738245 638
## 5 7 0.3751759 199
## 6 8 0.3911111 18
Better quality wine have bigger mean of citric.acid.
While citric.acid would add ‘freshness’ or flavor to wine, there’s few correlation
between quality and citric.acid. But there’s a tendency that better quality wine
has higher mean citric.acid.
##
## Pearson's product-moment correlation
##
## data: wine$citric.acid and wine$alcohol
## t = 4.4188, df = 1597, p-value = 1.059e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.06121189 0.15807276
## sample estimates:
## cor
## 0.1099032
Few correlation between citric.acid and alcohol.
##
## Pearson's product-moment correlation
##
## data: wine$pH and wine$density
## t = -14.53, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.3842835 -0.2976642
## sample estimates:
## cor
## -0.3416993
There’s a meaningful but small correlation between pH and density.
##
## Pearson's product-moment correlation
##
## data: wine$citric.acid and wine$pH
## t = -25.767, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.5756337 -0.5063336
## sample estimates:
## cor
## -0.5419041
pH and citric.acid have a moderate negative correlation around -0.5419.
Tip: As before, summarize what you found in your bivariate explorations here. Use the questions below to guide your discussion.
From the ggcorr correlation matrix, I found there might be some meaningful
correlation between quality and alcohol, alcohol and pH, citric.acid and density,
citric.acid and pH, citric.acid and quality.
I found alcohol and quality have a moderate correlation that wine with
higher alcohol tend to be in better quality.The correaltion is around 0.476.
Few correlation is existed between quality and citric.acid. But I found that
better quality wine has higher mean citric.acid. For example, the mean citric.acid
of quality 8 wine is 0.3911, while the mean citric.acid of quality 4 wine is 0.1742.
*citric.acid and density have a meaningful but small correlation around 0.36.
pH and density have a meaningful but small correlation around -0.34. To be
specific, when density increase, pH tend to decrease.
pH and citric.acid have a moderate negative correlation around -0.5419.
pH and citric.acid have the strongest relationship in my finding.
It’s hard to see the results because of so much different colors. So I created
quality_bucket for better visualization.
It seems like three quality groups follow the relationship between density and
alcohol.
Quality groups follow the relationship of pH and density. And it’s clear to see
that low quality group has shorter range of pH and density, compared with medium
and high quality group.
Quality groups follow the relationship of pH and citric.acid. The low quality
group has a relatively bigger range of citric.acid. Also, I noticed there’s a lot
medium quality wine have 0 citric.acid, compared to low and high quality groups.
By calculating r-squared value, I want to test if the strongest variable alcohol
would strong r-squared value to proof its linear relationship with quality.
m1 <- lm(wine$quality ~ wine$alcohol)
m2 <- lm(wine$quality ~ wine$alcohol+wine$density)
m3 <- lm(wine$quality ~ wine$alcohol+wine$density+wine$citric.acid)
m4 <- lm(wine$quality ~ wine$alcohol+wine$density+wine$citric.acid+wine$pH)
summary(m1)$r.squared
## [1] 0.2267344
summary(m2)$r.squared
## [1] 0.2317266
summary(m3)$r.squared
## [1] 0.2576685
summary(m4)$r.squared
## [1] 0.2626409
I chose alcohol (have the strongest correlation with quality among my interested
variable) to test the lineary relation with quality. Unfortunately, the r-squared
is not strong (0.22673).
But when I added each of the variables of interest into this model, the r-squared
value did improve from 0.22673 to 0.2626.
m5 <- lm(wine$density ~ wine$alcohol)
summary(m5)$r.squared
## [1] 0.2461944
Weak r-squared value to proof linear correlation between alcohol and density.
From bivariate analysis I found out that density and alcohol have a modereate
negative correlation. And from multivarite analysis by adding quality groups into
the plot, I found out that quality gourps follow the relationship of density and
alcohol.
I noticed that among my featured variables, alcohol has the strongest
relationship with quality. So I calculated its r-squared value. Although the
r-squared value between them is not strong (around 0.22673), it did improve
from 0.22673 to 0.22626 when I added variables, such as
density, citric.acid and pH, into the model.
Depending on the Pearson correlation value, I thought the r-squared value
between alcohol and quality must be strong, at least bigger than 0.5. But it
turned out my suspection was wrong. But it did surprised my that the r-squared
value increased every time I added another featured variables into the model.
It also surprised me that quality groups all follow the meaningful relationships
which I found in bivariate analysis. To be specific, quality groups follow the
relationships of alcohol and density, density and pH, pH and citric.acid.
I created a math linear model.I sed quality as dependent variable, and alcohol as
independent variable. After I found out the r-squared value is not strong enough,
I added citric.acid, density, and pH one at a time as independent variable into
the model. The result r-squared value did improve, but still not strong enough.
The model clearly shows each r-squared value when you added a new featrued variable.
So it’s easy and clear to see the result that if they have linear correlation.
But there’s limitations of this model. Since I didn’t put all the variables in the
dataset to test the model. There may still be some major variable that I didn’t
include in the model.
Alcohol and density have a moderate negative correlation around -0.496. Wine with
higher alcohol percentage by volume tend to have lower density (g / cm^3). And
all wine quality groups follow the relationship of alcohol and density.
Alcohol have strongest correlation with quality around 0.476. Wines with higher
alcohol percentage by volume tend to be in better quality.But I did notice that
wine with quality scoring 5 is a bit out of the line. It might because there’s
still potential variables (toghether with alcohol to influence quality) that I
didn’t discuss.
pH and citric acid (g / dm^3) have a moderate negative correlation around -0.5419.
Wine with higher citric acid (g / dm^3 ) tend to have lower pH. And all wine quality
groups follow this relationship of pH and citric acid. Also, low quality group of
wine tend to have larger range of citric acid (g / dm^3), compared to medium and
high quality group of wines.
This Red Wine Quality dataset contained 1,599 observations of red wines. There’re
12 variables in the dataset, including 11 variables of chemical properties in
these wines, and 1 output variable of wine quality, which graded by experts and
is between 0 (very bad) and 10 (very excellent).
I’m interested in exploring how these chemical properties influence the quality
of wine. Through univariate, bivariate, multivariate analysis and statistical
analysis, I tested different relationships between these variables.
Among the variables included in the dataset, alcohol had the strongest correlation
with wine quality. The correlation is around 0.476. Wines with higher alcohol
percentage by volume tend to be in better quality. Unfortunately, the calculated
r-squared value between alcohol and quality is not strong (around 0.22673). But
when I added each of the variables (which I’m interested in this dataset) one at
a time into this model, the r-squared value did improve from 0.22673 to 0.2626.
I think the limitations of this dataset would be one of the major challenges.
Amond 1,599 obeservations of wines, 82.4% of wines received score of 5 or 6.
Around 4% of wines received score of 3 or 4, and 13.6% of wines received score
of 7 or 8. It would be better to have a larger variety of quality score for the
dataset.
For future further analysis, it would be interesting and meanfing to combine or
compare this dataset with the white wine datast. So we can see how these chemical
properties’ correlation with quality changed.